README FILE for **`The End of the American Dream? Inequality and Segregation in US Cities`**


## Hardware and Software Declaration
- All analyses and data‐processing routines were executed on a Windows 11 workstation equipped with an Intel Core i7‑10700 CPU (2.9 GHz) and 32 GB of RAM.
- Statistical computing was performed using Stata-MP 18.0 and R version 4.2.2.



## Steps for Replication

1. **Download Data**  
   Retrieve all required raw data files and place them in the `Raw_Data/` directory, following the instructions in `Data_download.xlsx`.
- Within Social Explorer/, create subfolders as specified in the “Folder” column of Data_download.xlsx.
Store each year’s data in its own subfolder (for example, Social Explorer/Income/2010/, Social Explorer/Education/1980/, etc.), keeping files for different years separate


2. **Configure File Paths**  
   Open `Master.do` (located in the `codes/` directory) and update the file‑path definitions at the top of the script so they point to your local `Raw_Data/`, `Output/`, and other project directories.

3. **Execute Master Script**  
   From within Stata, run `Master.do`. This master script orchestrates the entire replication workflow:  
   - Cleans and processes raw data  
   - Generates all figures  
   - Calculates calibration targets  
   - Produces the tables used in the manuscript  

   Each component of the workflow is implemented in a separate `.do` file, organized into subfolders. You may also run these scripts individually if you wish to reproduce specific parts of the analysis.

---

### Description of Folders

#### `./Raw_Data/`
Contains all raw input data files used in the replication.

- **`2003_crosswalk_all_id_bc.dta`** — County-to-metro crosswalk
- **`inc_bracket.dta`** — Income‐bracket definitions for census years
- **`saiz_elas.dta`** — Elasticity estimates from Saiz (2010)
- **`./BLS/BLS_1980.dta`** — Industry‐level employment data by census tract for 1980
- **`./BLS/BLS_2010.dta`** — Industry‐level employment data by census tract for 2010
- **`Social Explorer/`** — Census and demographic data downloaded from Social Explorer



#### `./Data_Cleaning/`
- **`clean_census_tract.do`**  
  Imports and raw census‑tract–level data, performs cleaning and variable construction, and saves intermediate and final datasets to `./Output/` (see “Output Files” below).  
** Note: We replace the countrycode for Miami-Dade County, FL from 12025 to 12086  in order to keep the countycode consistent with our crosswalk
- **`compute_gini.do`**  
  Calculates the Gini coefficient for each metropolitan area from the cleaned tract‑level data; results saved to `./Output/`.  
- **`clean_metro.do`**  
  Merges and cleans metropolitan‑level dissimilarity, Gini, and racial composition data; output saved to `./Output/`.  

#### `./Calibration/`
- **`tab1_calibration_targets.do`**  
  Computes all calibration targets reported in Table 1 of the paper; outputs saved to `./Output/`.  

#### `./Figures/`
- **`fig1.do`**, **`fig3.do`**, **`fig4a.do`**, **`fig4b.do`**, **`fig5.do`**, **`fig16.do`**  
  Each script generates the corresponding figure in the manuscript.  

#### `./Regressions/`
- **`reg_gini_dissim.do`**  
  Runs all regression analyses involving Gini and dissimilarity indices and produces summary tables; tables are saved to `./Output/`.  

---

## Output Files

All cleaned and processed datasets are written to `./Output/` as follows:

- **`clean_census_tract.dta`**  
  Cleaned tract‑level dataset with constructed variables.  
- **`metro_gini_all.dta`**  
  Metropolitan‑level dataset containing Gini measures for each metro area.  
- **`metro_level_gini_dissim.dta`**  
  Metro‑level dataset combining dissimilarity, Gini, and racial composition variables.  
- **`popweight.dta`**  
  Metro‑level population totals used as weights in various calculations.  


